AITopics | seed entity

Collaborating Authors

seed entity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs

Cattaneo, Alberto, Luschi, Carlo, Justus, Daniel

arXiv.org Artificial IntelligenceDec-5-2025

Retrieval of information from graph-structured knowledge bases represents a promising direction for improving the factuality of LLMs. While various solutions have been proposed, a comparison of methods is difficult due to the lack of challenging QA datasets with ground-truth targets for graph retrieval. We present SynthKGQA, an LLM-powered framework for generating high-quality Knowledge Graph Question Answering datasets from any Knowledge Graph, providing the full set of ground-truth facts in the KG to reason over questions. We show how, in addition to enabling more informative benchmarking of KG retrievers, the data produced with SynthKGQA also allows us to train better models.We apply SynthKGQA to Wikidata to generate GTSQA, a new dataset designed to test zero-shot generalization abilities of KG retrievers with respect to unseen graph structures and relation types, and benchmark popular solutions for KG-augmented LLMs on it.

large language model, machine learning, subgraph, (21 more...)

arXiv.org Artificial Intelligence

2511.04473

Country:

Oceania > Nauru (0.15)
Europe > Austria > Vienna (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
(15 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (0.68)
Leisure & Entertainment > Sports > Soccer (0.68)
Media > Film (0.68)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

BMGQ: A Bottom-up Method for Generating Complex Multi-hop Reasoning Questions from Semi-structured Data

Qiu, Bingsen, Liu, Zijian, Liu, Xiao, Wang, Bingjie, Zhang, Feier, Qin, Yixuan, Li, Chunyan, Yang, Haoshen, Gao, Zeren

arXiv.org Artificial IntelligenceNov-26-2025

Building training-ready multi-hop question answering (QA) datasets that truly stress a model's retrieval and reasoning abilities remains highly challenging recently. While there have been a few recent evaluation datasets that capture the characteristics of hard-to-search but easy-to-verify problems -- requiring the integration of ambiguous, indirect, and cross-domain cues -- these data resources remain scarce and are mostly designed for evaluation, making them unsuitable for supervised fine-tuning (SFT) or reinforcement learning (RL). Meanwhile, manually curating non-trivially retrievable questions -- where answers cannot be found through a single direct query but instead require multi-hop reasoning over oblique and loosely connected evidence -- incurs prohibitive human costs and fails to scale, creating a critical data bottleneck for training high-capability retrieval-and-reasoning agents. To address this, we present BMGQ, a bottom-up automated method for generating high-difficulty, training-ready multi-hop questions from semi-structured knowledge sources. The BMGQ system (i) grows diverse, logically labeled evidence clusters through Natural Language Inference (NLI)-based relation typing and diversity-aware expansion; (ii) applies reverse question construction to compose oblique cues so that isolated signals are underinformative but their combination uniquely identifies the target entity; and (iii) enforces quality with a two-step evaluation pipeline that combines multi-model consensus filtering with structured constraint decomposition and evidence-based matching. The result is a scalable process that yields complex, retrieval-resistant yet verifiable questions suitable for SFT/RL training as well as challenging evaluation, substantially reducing human curation effort while preserving the difficulty profile of strong evaluation benchmarks.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2510.24151

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Pacific Ocean (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Passenger (0.94)
Transportation > Air (0.94)
Consumer Products & Services > Travel (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs

Omar, Reham, Mangukiya, Omij, Mansour, Essam

arXiv.org Artificial IntelligenceJan-16-2025

Dialogue benchmarks are crucial in training and evaluating chatbots engaging in domain-specific conversations. Knowledge graphs (KGs) represent semantically rich and well-organized data spanning various domains, such as DBLP, DBpedia, and YAGO. Traditionally, dialogue benchmarks have been manually created from documents, neglecting the potential of KGs in automating this process. Some question-answering benchmarks are automatically generated using extensive preprocessing from KGs, but they do not support dialogue generation. This paper introduces Chatty-Gen, a novel multi-stage retrieval-augmented generation platform for automatically generating high-quality dialogue benchmarks tailored to a specific domain using a KG. Chatty-Gen decomposes the generation process into manageable stages and uses assertion rules for automatic validation between stages. Our approach enables control over intermediate results to prevent time-consuming restarts due to hallucinations. It also reduces reliance on costly and more powerful commercial LLMs. Chatty-Gen eliminates upfront processing of the entire KG using efficient query-based retrieval to find representative subgraphs based on the dialogue context. Our experiments with several real and large KGs demonstrate that Chatty-Gen significantly outperforms state-of-the-art systems and ensures consistent model and system performance across multiple LLMs of diverse capabilities, such as GPT-4o, Gemini 1.5, Llama 3, and Mistral.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.09928

Country:

North America > Canada (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion

Wang, Hebin, Li, Yangning, Li, Yinghui, Zheng, Hai-Tao, Jiang, Wenhao, Kim, Hong-Gee

arXiv.org Artificial IntelligenceDec-31-2024

The rapid development of multimodal large language models (MLLMs) has brought significant improvements to a wide range of tasks in real-world applications. However, LLMs still exhibit certain limitations in extracting implicit semantic information. In this paper, we apply MLLMs to the Multi-modal Entity Set Expansion (MESE) task, which aims to expand a handful of seed entities with new entities belonging to the same semantic class, and multi-modal information is provided with each entity. We explore the capabilities of MLLMs to understand implicit semantic information at the entity-level granularity through the MESE task, introducing a listwise ranking method LUSAR that maps local scores to global rankings. Our LUSAR demonstrates significant improvements in MLLM's performance on the MESE task, marking the first use of generative MLLM for ESE tasks and extending the applicability of listwise ranking.

arxiv preprint arxiv, candidate entity, seed entity, (14 more...)

arXiv.org Artificial Intelligence

2501.0033

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

KGPrune: a Web Application to Extract Subgraphs of Interest from Wikidata with Analogical Pruning

Monnin, Pierre, Nousradine, Cherif-Hassan, Jarnac, Lucas, Zuckerman, Laurel, Couceiro, Miguel

arXiv.org Artificial IntelligenceAug-26-2024

Knowledge graphs (KGs) have become ubiquitous publicly available knowledge sources, and are nowadays covering an ever increasing array of domains. However, not all knowledge represented is useful or pertaining when considering a new application or specific task. Also, due to their increasing size, handling large KGs in their entirety entails scalability issues. These two aspects asks for efficient methods to extract subgraphs of interest from existing KGs. To this aim, we introduce KGPrune, a Web Application that, given seed entities of interest and properties to traverse, extracts their neighboring subgraphs from Wikidata. To avoid topical drift, KGPrune relies on a frugal pruning algorithm based on analogical reasoning to only keep relevant neighbors while pruning irrelevant ones. The interest of KGPrune is illustrated by two concrete applications, namely, bootstrapping an enterprise KG and extracting knowledge related to looted artworks.

kgp rune, seed entity, wikidata, (12 more...)

arXiv.org Artificial Intelligence

2408.14658

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Greece (0.04)
North America > United States > New York > New York County > New York City (0.04)
(11 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.47)

Add feedback

Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains

Zhang, Yu, Zhang, Yunyi, Shen, Yanzhen, Deng, Yu, Popa, Lucian, Shwartz, Larisa, Zhai, ChengXiang, Han, Jiawei

arXiv.org Artificial IntelligenceJan-23-2024

Accurately typing entity mentions from text segments is a fundamental task for various natural language processing applications. Many previous approaches rely on massive human-annotated data to perform entity typing. Nevertheless, collecting such data in highly specialized science and engineering domains (e.g., software engineering and security) can be time-consuming and costly, without mentioning the domain gaps between training and inference data if the model needs to be applied to confidential datasets. In this paper, we study the task of seed-guided fine-grained entity typing in science and engineering domains, which takes the name and a few seed entities for each entity type as the only supervision and aims to classify new entity mentions into both seen and unseen types (i.e., those without seed entities). To solve this problem, we propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus using the contextualized representations of pre-trained language models. It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types. Extensive experiments on two datasets covering four domains demonstrate the effectiveness of SEType in comparison with various baselines.

entity typing, seed entity, set ype, (15 more...)

arXiv.org Artificial Intelligence

2401.13129

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

From Retrieval to Generation: Efficient and Effective Entity Set Expansion

Huang, Shulin, Ma, Shirong, Li, Yangning, Li, Yinghui, Jiang, Yong, Zheng, Hai-Tao, Shen, Ying

arXiv.org Artificial IntelligenceOct-31-2023

Entity Set Expansion (ESE) is a critical task aiming at expanding entities of the target semantic class described by seed entities. Most existing ESE methods are retrieval-based frameworks that need to extract contextual features of entities and calculate the similarity between seed entities and candidate entities. To achieve the two purposes, they iteratively traverse the corpus and the entity vocabulary, resulting in poor efficiency and scalability. Experimental results indicate that the time consumed by the retrieval-based ESE methods increases linearly with entity vocabulary and corpus size. In this paper, we firstly propose Generative Entity Set Expansion (GenExpan) framework, which utilizes a generative pre-trained auto-regressive language model to accomplish ESE task. Specifically, a prefix tree is employed to guarantee the validity of entity generation, and automatically generated class names are adopted to guide the model to generate target entities. Moreover, we propose Knowledge Calibration and Generative Ranking to further bridge the gap between generic knowledge of the language model and the goal of ESE task. For efficiency, expansion time consumed by GenExpan is independent of entity vocabulary and corpus size, and GenExpan achieves an average 600% speedup compared to strong baselines. For expansion effectiveness, our framework outperforms previous state-of-the-art ESE methods.

expansion, language model, seed entity, (16 more...)

arXiv.org Artificial Intelligence

2304.03531

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
North America > United States > Nevada (0.05)
North America > United States > Texas (0.05)
(28 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Relevant Entity Selection: Knowledge Graph Bootstrapping via Zero-Shot Analogical Pruning

Jarnac, Lucas, Couceiro, Miguel, Monnin, Pierre

arXiv.org Artificial IntelligenceAug-16-2023

Knowledge Graph Construction (KGC) can be seen as an iterative process starting from a high quality nucleus that is refined by knowledge extraction approaches in a virtuous loop. Such a nucleus can be obtained from knowledge existing in an open KG like Wikidata. However, due to the size of such generic KGs, integrating them as a whole may entail irrelevant content and scalability issues. We propose an analogy-based approach that starts from seed entities of interest in a generic KG, and keeps or prunes their neighboring entities. We evaluate our approach on Wikidata through two manually labeled datasets that contain either domain-homogeneous or -heterogeneous seed entities. We empirically show that our analogy-based approach outperforms LSTM, Random Forest, SVM, and MLP, with a drastically lower number of parameters. We also evaluate its generalization potential in a transfer learning setting. These results advocate for the further integration of analogy-based inference in tasks related to the KG lifecycle.

artificial intelligence, machine learning, seed entity, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3583780.3615030

2306.16296

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
(25 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automatic Context Pattern Generation for Entity Set Expansion

Li, Yinghui, Huang, Shulin, Zhang, Xinwei, Zhou, Qingyu, Li, Yangning, Liu, Ruiyang, Cao, Yunbo, Zheng, Hai-Tao, Shen, Ying

arXiv.org Artificial IntelligenceMar-8-2023

Entity Set Expansion (ESE) is a valuable task that aims to find entities of the target semantic class described by given seed entities. Various Natural Language Processing (NLP) and Information Retrieval (IR) downstream applications have benefited from ESE due to its ability to discover knowledge. Although existing corpus-based ESE methods have achieved great progress, they still rely on corpora with high-quality entity information annotated, because most of them need to obtain the context patterns through the position of the entity in a sentence. Therefore, the quality of the given corpora and their entity annotation has become the bottleneck that limits the performance of such methods. To overcome this dilemma and make the ESE models free from the dependence on entity annotation, our work aims to explore a new ESE paradigm, namely corpus-independent ESE. Specifically, we devise a context pattern generation module that utilizes autoregressive language models (e.g., GPT-2) to automatically generate high-quality context patterns for entities. In addition, we propose the GAPA, a novel ESE framework that leverages the aforementioned GenerAted PAtterns to expand target entities. Extensive experiments and detailed analyses on three widely used datasets demonstrate the effectiveness of our method. All the codes of our experiments are available at https://github.com/geekjuruo/GAPA.

context pattern, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2207.08087

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Georgia (0.14)
North America > United States > Ohio (0.05)
(42 more...)

Genre: Research Report (1.00)

Industry:

Education (0.94)
Health & Medicine (0.93)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Crawling the Internal Knowledge-Base of Language Models

Cohen, Roi, Geva, Mor, Berant, Jonathan, Globerson, Amir

arXiv.org Artificial IntelligenceJan-30-2023

Language models are trained on large volumes of text, and as a result their parameters might contain a significant body of factual knowledge. Any downstream task performed by these models implicitly builds on these facts, and thus it is highly desirable to have means for representing this body of knowledge in an interpretable way. However, there is currently no mechanism for such a representation. Here, we propose to address this goal by extracting a knowledge-graph of facts from a given language model. We describe a procedure for ``crawling'' the internal knowledge-base of a language model. Specifically, given a seed entity, we expand a knowledge-graph around it. The crawling procedure is decomposed into sub-tasks, realized through specially designed prompts that control for both precision (i.e., that no wrong facts are generated) and recall (i.e., the number of facts generated). We evaluate our approach on graphs crawled starting from dozens of seed entities, and show it yields high precision graphs (82-92%), while emitting a reasonable number of facts per entity.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.1281

Country:

Europe > Italy (0.04)
Europe > Austria > Vienna (0.04)
Asia > Philippines (0.04)
(12 more...)

Genre: Research Report (0.64)

Industry:

Government (0.93)
Media (0.93)
Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.55)

Add feedback